-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: Improve cluster connection pool logic when disconnecting #1864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: Improve cluster connection pool logic when disconnecting #1864
Conversation
… to the connection pool instance
|
I now created a separate repository that (hopefully) makes it easy to reproduce the bug. We have been using the fix in this branch in production throughout the last roughly 3 months and it has considerably reduced the error rates we are seeing when shutting down Bull queue clients. |
This reverts commit 2979176.
…e to connect using the Cluster client
|
I just pushed the fixes identified in valkey-io/iovalkey#5. |
|
This pull request has been automatically marked as stale due to inactivity. |
Motivation and Background
This is an attempt to fix errors occurring when a
connect()call is made shortly after adisconnect(), which is something that the Bull library does when pausing a queue.Here's a relatively minimal way to reproduce an error:
Running that script in a loop using
against the
mainbranch ofioredisquickly results in this output:My debugging led me to believe that the existing node cleanup logic in the
ConnectionPoolclass leads to race conditions: upondisconnect(), the this.connectionPool.reset() call will remove nodes from the pool without cleaning up the event listener which may then subsequently issue more than onedrainevent. Depending on timing, one of the extradrainevents may fire afterconnect()and change the status toclose, interfering with the connection attempt and leading to the error above.Changes
ConnectionPoolclass and remove them from the nodes whenever they are removed from the pool.-node/drainregardless of whether nodes disconnected or were removed through areset()call.reset(), add nodes before removing old ones to avoid unwanteddrainevents.thispoint to the connection pool instance.mainis seemingly different from the error shown above but it still seems related to the disconnection logic and still gets fixed by the changes in this PR.